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Abstract. The built-up land represents an important type of an overall land- 
scape. In this paper the built-up land structure in the largest cities in the 
Czech Republic and some selected cities in the U.S.A. is analyzed using the 
framework of statistical physics. We calculate the variance of the total area 
and of the count of the built-up land plots contained inside discs of different 
radii. In both cases the variance as a function of the disc radius follows a power 
law with exponents that are comparable through different cities. The study 
is based on the cadastral data from the Czech Republic and on the building 
footprints from CIS data in the U.S.A. 



1. Introduction 

Urban land represents one of the most significant fingerprints of human activity 
on the Earth. The creation and development of its structure is influenced by cul- 
tural, sociological, economic, political and other conditions. Despite the apparent 
complexity, some simple universal properties and rules were found. The classic ex- 
ample is the rank size distribution of cities firstly mentioned by Auerbach (in pQ) 
and later discussed by Zipf |2j. They claimed that if the cities are ranked by the 
number of inhabitants, then the rank-size distribution follows a power law with the 
exponent close to -1 (see also [3]). 

From the physical point of view it is interesting to study spatial properties of the 
urban structure. Existing studies [TJ |U [5J |S1 El 121 IE] focused especially on the fractal 
structure and the related scaling-laws of urban clusters. The analyzed pattern is 
usually given by some coarse-grained map of the spatial population spread and 
the general claim is that the spatial structure of large urban areas is influenced by 
certain long-range spatial correlations. Motivated by those properties several urban 
models were introduced QJ H M ESI EH E2] • 

Our aim here is to study the urban structure in a different representation and on 
much smaller scales. The analyzed data consists of the exact positions of buildings 
in the cities where we can expect the urbanization to be only slowly varying with 
time. The vertical projection of buildings onto the ground gives a built-up land 
pattern, that forms a natural representation of the city since it is clearly visible 
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and identifiable. Built-up land is usually stable over short and medium time pe- 
riods as days, months and even years, except of rapid changes during disasters. 
It is straightforward to expect that the built-up land pattern is correlated with 
population spread since the human live is strongly connected with buildings. The 
exact relation however can be rather complicated as it depends on many additional 
features like the whole 3D shape (capacity) or the usage (living, working) of ev- 
ery building and can in general vary during the day and over short time periods 
(weekends, vacations e.g.). 

In this paper we show that the correlation and fluctuation properties of such 
built-up land pattern in the city centres are similar as those of the critical systems 
in thermodynamics. This can represent a connection between the urban and the 
critical systems. It can be especially helpful when discussing the self-organized 
criticality concept |13] that was for the urban system introduced by Batty and Xie 
[14] with a fractal dimension as the criticality indicator. 

A connection to the critical systems (phase transitions) can originate from the 
fact that economically the change of a land to a built-up type represents a change 
of phase. The land acquires an additional property - a building. 

2. Critical phenomena 

Let us briefly recall the correlation and fluctuation properties typical for thermo- 
dynamic systems near the critical point (e.g. [HI HH [17] ) . For a further discussion 
on the scaling properties in complex systems see [18] . 

The most important property of the static spacial structure of a critical system 
is its scaling invariance. In simple terms, if a part of the system is magnified to 
the same size as the original system, it is not possible to distinguish between the 
magnified part and the original system. 

In order to describe these features explicitly let us define a local order parameter 
rn as a quantity that solely describes the microscopic state of the system in one 
realization. Thus m(r) is the value of the parameter (e.g. density, local magne- 
tization or boolean indicator of some property occurrence) at the position r G V, 
where V C 1R 2 is the total volume occupied by the 2-dimensional system living in 
the plane. 

Suppose there exist a whole ensemble of realizations (different cities can be 
treated as different realizations) and denote by (...) the ensemble average. We 
say that the system is homogeneous and isotropic in the volume V , if 

(1) (m(r)) = (m(0)> = to, Vr 6 V. 

This means that the mean value of the local order parameter is independent of 
position in the volume. From now we assume the system to be homogeneous and 
isotropic. 

The spatial properties of the order parameter distribution can be described by 
the two-point correlation function defined as 

(2) G{n,r 2 ) = /(m(ri) - (m(n)>) (m(r 3 ) - <m(r a )>) V 

This is under homogeneity and isotropy assumptions simplified to 

(3) G(n, ra) = G(r 2 - n) = G(r) = (m(r)m(O)) - m 2 , 
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where r = r 2 — T\ and r = \r\. Therefore the correlation function depends only on 
the relative distance \r\ = \r 2 — r\\ of the two points r± and T%. 

The scaling assumption for systems at the critical point can be written in the 
following forrnj: 

(4) G M~^, 0<77<2, 

Index 77 appearing in the exponent of the power law part of G(r) is called the 
anomalous dimension. For systems outside the critical point the correlation function 
decays with increasing r much faster (usually exponentially). 

The definition of the order parameter can be extended to systems composed of 
point particles. Here, the empirical density function p is taken as the local order 
parameter. For particles located at points r%, r%, 7*3, n., ... S 1R 2 it is given by 

00 

(5) P(r) = 5>(r-r«). 

i=l 

It is well known [T7] that the correlation function for such density can be decom- 
posed to 

(6) G(r 2 - ri) = p 5(r 2 - n) + Q(r 2 - n), 

where Q(y% — r\) is the non-diagonal part of the form (0} defined for r = \r% — r\ \ > 
0. The difference to the ordinary order parameter is thus only in the diagonal (5 
therm which of course doesn't influence the character of the divergence in the 
vicinity of the critical point. 

2.1. Parameter variance in discs. An useful tool to analyze the experimental 
data is the variance of the parameter value inside discs. For the parameter m(r) 
with homogeneous and isotropic distribution (771(7*)) = m the cumulative value of 
the parameter in the disc of a radius R is given by 

(7) M(R) = J m(r)dr, 

S(R) 

where the disc is the set S(R) = {r e E, 2 |r| < R} with a volume |5(ii)|. The 
centre of the disc is not important due to the homogeneity of the parameter distri- 
bution. The parameter variance is defined |19) as 

(8) a 2 (R) = (M{Rf) - (M{R))\ 
where 

(9) (M(R))= J {m(r))dr =m\S(R)\, 

S(R) 

and 

(10) (M(R) 2 ) = J J {m{r 1 )m{r 2 ))dr l dr 2 . 

S(R) S(R) 

Using the definition ([2]) of the two-point correlation function and the fluctuation- 
dissipation theorem [TB], one obtains two different asymptotic relations for a 2 (R): 

1 Using notation: f(x) ~ g(x) lim^^oo ^4 = c and f(x) oc g(x) = c, Vx. 
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Outside of the critical point the following relation holds: 

(11) cr 2 (i?) - (M(R)) oc R 2 , R > 1. 

A different situation arises when the system is approaching the critical point. 
Spatial correlations in this region are long-ranged and the correlation function is 
dominated by the power- law decay (U). This gives 

(12) a 2 (R) ~ (M{R)) 2 -§ oci? 4 -", l<fl<£, 

where £ is the correlation length of the system that goes to infinity as the system 
approaches the critical point. The correlation length can be understood as the 
range of interactions. 

Thus, in order to determine the criticality of a thermodynamic system, one can 
study its fluctuations. If the dependence of <J 2 {R) on (M(R)) follows the power 
law with exponent larger than 1, than the system is close to the critical point. 
The breakdown of the power-law (TT2"|) for very large R is connected with reaching 
the correlation length. However because of the complicated relation between the 
correlation function G{r) and <r 2 (i?), a more appropriate way to determine the 
correlation length £ is a direct study of the correlation function and breakdown of 
the scaling form (jH). 

3. Data analysis 

In this part we show how the method from previous section can be applied to 
the built-up land pattern. The analyzed data consist of two different datasets: 

Cadastral records in the Czech Republic 

The first dataset is formed the by cadastral records stored by COSMC (Czech 
Office for Surveying Mapping and Cadastre). In general cadastral records con- 
tain information about the fractalization of the overall landscape into the smallest 
unique pieces of land - the land plots (parcels). In the Czech Republic, every land 
plot i is characterised by its definition point r*, exact geodetic shape, size (acreage) 
Aj, type of land and the ownership information. Our data contains all information 
except of the exact shape for all land plots in the CR. The only geodetic information 
is thus the definition point of the land parcel that is the point located approximately 
at the centroid of the parcel. The example of this data is shown in figure [T] 

Since our interest is in the built-up structure, we restrict our attention to the 
built-up land plots only (built-up land plot = a building on it). 

Building footprints in the U.S.A. 

The second part of our data, the building footprints, are part of the GIS data 
in the ESRI Shapcfilc format available for few U.S.A. cities on the Internet (see 
section Resources for links). The building footprints are represented by polygons in 
the plane. Those polygons reflect the vertical projection of the overlaying buildings 
to the ground. Visualisation of a small part of the data is shown on the figure [2l 

3.1. Representation. In order to analyse those different datasets we use two dif- 
ferent but straightforward representations: 

Point representation is given by the definition points (CR dataset) resp. centroids 
of polygons (U.S.A. dataset). For every city this gives a set of points {ri,i 6 /}. 
The order parameter that characterizes such point pattern is given by the singular 
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FIGURE 1. Example 

of the Czech Republic 
dataset. Blue points 
represent the definition 
points of the parcels. The 
polygon data clearly show- 
ing the real shape of the 
parcels are not accessible 
to our analysis. 




FIGURE 2. Example 
of the U.S.A. dataset. 
Building footprints are 
defined as polygons. 
Centroids of polygons 
are plotted as blue 
points. 



point density p(r) given by (0. The parameter variance cr 2 (i?) means the variance 
of the number of points inside discs. 

The estimation of cr 2 (i?) for a given radius R is done in the following way: Inside 
the investigated part of the city A C R 2 we uniformly and randomly choose centres 
Oj of N, N >> 1 (usually N = 2000) discs, so that every disc is a subset of A, 
S(oj 7 R) = Oj + S(R) C A. For every disc S(oj,R) the number of inner points 
Nj(R) is calculated, 



(13) N j (R)= J P (r)dr. 

S(of,H) 

The mean value is then estimated by 

(14) (N(R)) = ±Y, N iW 



N ■ 



and the variance by 

(15) <? 2 (R) = j^j £ (Nj(R) (N(R)) 

3 

The example of a working area selection for Prague is shown in figure [3] 

Set representation better reflects the existing structure of the built-up land. Let 
assume that we work only with the second dataset. Therefore the build-up land is 
given by non- intersecting polygons {pi C R 2 ,i G /}. Such set can be represented 
as a subset of a plane given by boolean order parameter 



(16) m(r) 



1 building at r, (3i € 7, r G p,) 
othervise, (Vi G I,r £ pi) 



The parameter variance <J 2 {R) can be estimated by use of jTJ). 

If we want to deal with unknown shapes of the built-up parcels in the Czech 
case, the reasonable way is to approximate them by discs with the same area. 
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Figure 3. The point representation of Prague built-up land plots. 
The analyzed area is bounded by the red rectangle. The randomly 
positioned centres Oj of 200 discs are plotted as red points. 

Such approximation however cannot lead to the same (set) definition of the order 
parameter, because we generally cannot avoid the discs to overlap. This actually 
is not a problem and we can introduce the equivalent process of estimating <J 2 (R) 
that can be easily extended to the overlapping approximation. 

Let us now suppose that our built-up parcels are given as a set {pi C K 2 , i G 1} 
but the non-intersecting property is generally not valid. This set is easily obtained 
for both datasets. In the Czech case pi is the circle of the same area Xi as the i-th 
land plot located at its definition point rj, 



In the U.S.A. pi remains the polygon of the i-th building. 

The estimation of cr 2 {R) for a given radius R is then done as follows: Inside the 
studied part of the city A C R 2 we uniformly and randomly choose centres Oj of N, 
N >> I discs, so that every disc is contained in A, S(oj,R) = Oj + S(R) C A. For 
every disc S(oj,R) the built-up area Mj(R) inside it is calculated by the relation 



where X(A) stands for the area of the set A. In other words, we accumulate the 
area of every intersection of the land plot pi with the given disc S{oj, R) over all 
land plots. 

The process of calculating the area inside the disk S(oj,R) for polygons is de- 
picted in figure HI 

The calculation of Mj(R) for the polygons (U.S.A. dataset) is exact. On the 
other hand the disc approximation gives only an approximate result. The error 



(17) 




(18) 
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Figure 4. The example of a built-up area calculation for a set 
representation of the U.S.A. dataset. The area Mj(R) of polygons 
inside the disc S(oj,R) is marked as red. 



is however not large and it is easy to show that the effective error is inversely 
proportional R 

(19) 

Mj(R) 

with increasing R. Here Mj(R) is the correct built-up area acreage inside the j-th 
disc. This is because for R much larger than the typical parcel radius, deviations 
are produced only in the vicinity of the large disc boundary. 
The mean value is estimated by 

(20) (M(R)) = ±J2 M l( R ) 

j 

and the variance by 

(21) <? 2 (R) = j^j E { M i( R ) < M ^)>) 

j 

The estimations for the both representations are based on the assumption of 
self-averaging property [201 HI]- It means that a sufficiently large sample is a good 
representative of the whole ensemble. In our case however, the size of the sample is 
limited by the size of the city centre. By the city centre we mean the area around 
the city core (central 'plateau'), where the built-up density is virtually constant [8]. 
This part of the city does not participate in a process of massive urbanization in 
contrary to the edges of the city. It thus represent a structure in a steady state 
(slowly varying). This does not exclude some local urbanization changes that are 
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always presented. In order to speak about a virtually uniform density it is also 
necessary to omit certain locations like lakes or hilly places where it is impossible 
to construct a building from the analysis. Otherwise the fluctuations will increase. 



4. Results 

We analyzed the 6 largest cities in the Czech Republic and 6 cities in the U.S.A. 
In the centre of every city we calculated both the point number variance in discs 
(point representation) and the built-up area variance in discs (set representation) 
for different values of the disc radius R. The diameter of the city centre for a 
typical large Czech city is about 4 km. This size puts limitation on the maximal 
radius R of discs in order to obtain reasonable statistics. Together with the fact 
that the power law dependence, if present, can be theoretically reached for R 3> 1, 
we decided to study the fluctuations inside the region 200 m < R < 1000 m. 




Figure 5. Dependencies of <J 2 {R) for the set representation on 
(M(R)) for different cities in the Czech republic. 



The obtained dependencies of <J 2 (R) on (M(R)) for the set representation are 
shown on figures [5] and El It is clearly visible that for the set representation the 
result follows power law (inside the analysed radius range) . The same behaviour is 
valid also for the point representation. Thus in the studied range the fluctuations 
behave as 

(22) a 2 (R) (x (M(R)) a , a = 2-2. 

We determine the values of the exponent a by performing the linear regression on 
the logarithm relation 



(23) 



log(a 2 (i?)) =a\og((M(R)))+l3. 
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x Raleigh 



Boston 




(M(H)) 

Figure 6. Dependencies of u 2 (R) for the set representation (poly- 
gon) on (M(R)) for different cities in the U.S.A. 

Table 1. Exponents a according to the power law dependence 
(|22|) of a 2 (R) on (M(R)). In the 2-nd and 5-th column are values 
of the exponent for the point representation. The 3-rd and 6- 
th columns contain values for the set representation with circular 
approximation of the built-up units (for U.S.A. dataset - circles 
with the origin in the centroid and the same size as actual polygon) . 
The 7-th column contains values for the set representation using 
the actual known polygonal shape of the building footprints. 



City 


Points 


Area 
circle 


City 


Points 


Area 
circle polygon 


Praha 


1.47 


1.64 


Raleigh 


1.73 


1.58 


1.58 


Plzeh 


1.61 


1.69 


Pittsburgh 


1.62 


1.62 


1.62 


Liberec 


1.54 


1.65 


Boston 


1.69 


1.68 


1.69 


Brno 


1.40 


1.65 


Spokane 


1.69 


1.57 


1.59 


Ceske Budejovice 


1.50 


1.58 


Tompkins 


1.75 


1.57 


1.59 


Ostrava 


1.54 


1.62 


Springfield 


1.52 


1.30 


1.30 



The dependency of (M(R)) on R resp. <J 2 (R) on R follows the power law with 
exponent 2 resp. 2a as predicted by the homogeneity assumption resp. the relation 

The summary of resulting exponents a for studied cities is given in the table [Q 
As follows from (fTTj) . a = 1 stands for the system that is outside of the critical 
region, e.g. randomly positioned particles. One can see that this is not the case for 
the built-up land pattern. 
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The values of the exponent for the built-up land pattern are for both represen- 
tations much larger than 1. In the case of a point pattern the average value of the 
exponent is a p = 1.60. We can see a systematic difference between the Czech cities 
(lower values) and the American ones (larger values). 

More interesting results arise for the set representation. There is not a clear 
systematic difference in this case between the Czech republic and the U.S.A. The 
average value of the exponent is a a — 1.62 with much lower fluctuations around 
this value. The only significant deviation in the power law coefficient is represented 
by the City of Springfield (Clark County). Such a result can be explained by 
its constrained lattice-like structure (see figure [7]). Because the fluctuations are 
influenced by the local inhomogeneities of the pattern, the lattice-like structure 
produce a more homogeneous distribution (with lower fluctuations) of the built- 
up land than for the other more "organic" cities 4 ]. Theoretically in the case 
of an exactly rigid structure with identical buildings placed at the vertices of a 
perfect lattice the fluctuations will increase and the coefficient a < 1. For further 
information on this so-called superhomogeneous distribution see [Hi] . 




Figure 7. The constrained structure of Springfield. The built-up 
land is plotted in its exact polygonal representation. 

Comparing the values of the exponent a in the 6-th and 7-th column of the table 
[I] we can conclude that the approximation of an unknown parcel shape by a circle 
does not produce significant error. 

Let us now make a short comment on the analysed range of R. In our analysis 
we used the interval (200 m, 1000 m). For some large cities in the U.S.A. we were 
eventually able to increase the upper bound to 3 km, because the analysed city 
centre area can be taken larger than for Czech cities that are much smaller. The 
power-law dependence remains unchanged in the extended range (see figure [8] for 
Pittsburgh in the set representation). 

Some further extension of the region can be also calculated but the assumption 
about the homogeneity (constant density) of the distribution is then obviously not 
valid since the density decreases with the increasing distance from the city centre. 
This influences the presented estimations of a 2 (R) because the self-averaging cannot 
be applied. So even if the correlations may be evaluated on much larger distances, 
it is not proper to use this method outside of the theoretical homogeneous region. 
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Figure 8. Fluctuation dependence in Pittsburgh for a set repre- 
sentation (polygons) in the region R 6 (200, 3000) m. Straight line 
represents linear regression fit according to the logarithm relation 

iii). 

For the polygon representation of the built-up land in the U.S.A. it is also possible 
to estimate the correlation function directly. Results for Raleigh and Boston are 
shown in figure |9] The correlation function decay clearly follows a power law and 
the exponents are consistent with the values of a from table [Hand the relation ([22]) . 
The same consistency holds for the rest of the studied cities. 

Finally we generated an artificial city by positioning the centres of the buildings 
randomly. Buildings were approximated by circles with a size distributed with the 
same distribution as for a real city. The part of this artificial city is shown in 
figure I10I The correlation function for such pattern compared with the correlation 
function for Raleigh is shown in figure HTl The fluctuation properties of such pattern 
are consistent with the predictions given by 

5. Conclusion 

The study shows that dependence of fluctuations of the size of the build up area 
on its mean value follows a power law. Moreover the set representation of the plots 
seems to lead to more universal results. The values of the exponent a in the relation 
<J 2 {R) ~ (M(R)) a , for different cities (except of Springfield) are all very close to 
the value a — 1.62. The different results for Springfield can be explained by the 
strongly constrained lattice-like structure of that city. 

We can conclude that the inner urban area structure is correlated with a long- 
ranged power-law dependence. The power-law exponent seems to be independent 
of the particular city. Such an observation is interesting and the possible connection 
between the urban area correlations and the correlations in critical systems may 
be useful for the development and verification of further urban models. The fact 
that the inner (quasi-stable) part of the city has certain city independent properties 
supports the hypothesis of the self-organized criticality in the urban systems [M] . 
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Raleigh 



— — - Boston 




10° 10 1 10 Z 10 3 

r[m] 



Figure 9. The correlation function for Raleigh and Boston. 
Straight lines represent the linear regression fit of correlation func- 
tion tails. Expected power-law tail coefficients rj calculated from 
the respective values of a in tabled] (according to (|22j) ) are f) = 0.84 
for Raleigh, resp. rj = 0.62 for Boston. 




Figure 10. Part of a randomly generated city. 

However we are not able to study the process of the self-organization in details 
because our data do not reflect the dynamics of city growth, as we have only one 
time snapshot for each city. 
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Figure 1 1 . The correlation function for randomly generated pat- 
tern compared to the correlation function for Raleigh. 

Resources 

The building footprints for U.S.A. cities were obtained from the following web 
sites: 

• Boston - http://www.mass.gov/mgis/database.htm 

• Raleigh - http://www.wakegov.com/gis/default.htm 

• Pittsburgh - http://www.alleghenycounty.us/dcs/gis.aspx 

• Spokane - http://www.spokanecity.org/services/gis/ 

• Tompkins - http://cugir.mannlib.cornell.edu/index.jsp 

• Springfield (Clark County) - http://gis.clark.wa.gov/gishome/ 
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